Selenium 1

The libraries covered beofre (BeautifulSoup, lxml, Scrapy) provide user friendly interface for getting data from the web using the HTML source of a page. Yet, sometimes the HTML source is not directly available: it can be created as an output of a function (usually generated by JavaScript) which is toggled by a user input. For example, the data on a page can be generated by ckicling a button on a page or filling in a form or choosing a vlue from a filter. A simple request to a URL will not provide the data, as no user interaction has taken place. In this case, one should write a Python code that will act as a webbrowser. There are many libraries that provide this functionality, but we will concetrate on one of the most popular among them called Selenium. First of all, you need to ahve it installed by running the following command in the command prompt:

pip install selenium

Once selenium is installed you need to download the webdriver of your browser to your local directory. For example, if your notebook is inside the Data_Scraping folder, and your are using the Chrome/Firefox webbrowser, then you may download the drivers from here:

Alright, you are now ready to move to the code. Let's write an algorithm that will open the Chrome browser, go to the www.inventwithpython.com, find a hypterlink titled "Read It Online" (find it using the text directly) and click on it.


In [2]:
from selenium import webdriver
# change Chrome() below with Firefox(), if the latter is the driver you decided to use
browser = webdriver.Chrome()
url = 'http://inventwithpython.com'
browser.get(url)
our_element = browser.find_element_by_link_text('Read It Online')
type(our_element)
our_element.click() # follows the "Read It Online" link

In [3]:
browser.close()

Let's do a similar task for yahoo. These are the steps to take:

  • open the Chrome browser,
  • go to the yahoo mail login page,
  • find the username form and then fill it in with your e-mail address,
  • click on the next button,
  • find the password form and then fill it in with your password,
  • click submit.

In [4]:
from selenium import webdriver
browser = webdriver.Chrome()
browser.get('https://mail.yahoo.com')
email_element = browser.find_element_by_id('login-username')
email_element.send_keys('hrantdavtyan@yahoo.com')
next_button_element = browser.find_element_by_id('login-signin')
next_button_element.click()
password_element = browser.find_element_by_id('login-passwd')
password_element.send_keys('my_password')
password_element.submit()


---------------------------------------------------------------------------
NoSuchElementException                    Traceback (most recent call last)
<ipython-input-4-554959684b4a> in <module>()
      6 next_button_element = browser.find_element_by_id('login-signin')
      7 next_button_element.click()
----> 8 password_element = browser.find_element_by_id('login-passwd')
      9 password_element.send_keys('my_password')
     10 password_element.submit()

C:\Program Files\Anaconda2\lib\site-packages\selenium\webdriver\remote\webdriver.pyc in find_element_by_id(self, id_)
    280             driver.find_element_by_id('foo')
    281         """
--> 282         return self.find_element(by=By.ID, value=id_)
    283 
    284     def find_elements_by_id(self, id_):

C:\Program Files\Anaconda2\lib\site-packages\selenium\webdriver\remote\webdriver.pyc in find_element(self, by, value)
    782         return self.execute(Command.FIND_ELEMENT, {
    783             'using': by,
--> 784             'value': value})['value']
    785 
    786     def find_elements(self, by=By.ID, value=None):

C:\Program Files\Anaconda2\lib\site-packages\selenium\webdriver\remote\webdriver.pyc in execute(self, driver_command, params)
    247         response = self.command_executor.execute(driver_command, params)
    248         if response:
--> 249             self.error_handler.check_response(response)
    250             response['value'] = self._unwrap_value(
    251                 response.get('value', None))

C:\Program Files\Anaconda2\lib\site-packages\selenium\webdriver\remote\errorhandler.pyc in check_response(self, response)
    191         elif exception_class == UnexpectedAlertPresentException and 'alert' in value:
    192             raise exception_class(message, screen, stacktrace, value['alert'].get('text'))
--> 193         raise exception_class(message, screen, stacktrace)
    194 
    195     def _value_or_default(self, obj, key, default):

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"id","selector":"login-passwd"}
  (Session info: chrome=58.0.3029.110)
  (Driver info: chromedriver=2.30.477700 (0057494ad8732195794a7b32078424f92a5fce41),platform=Windows NT 6.1.7601 SP1 x86_64)

In [ ]:
browser.close()

CheatSheet - sourced from the "Automate boring staff with Python" book, chapter 11

Table 1: Selenium’s WebDriver Methods for Finding Elements

Method name

WebElement object/list returned

browser.find_element_by_class_name(name)
browser.find_elements_by_class_name(name)

Elements that use the CSS class name

browser.find_element_by_css_selector(selector)
browser.find_elements_by_css_selector(selector)

Elements that match the CSS selector

browser.find_element_by_id(id)
browser.find_elements_by_id(id)

Elements with a matching id attribute value

browser.find_element_by_link_text(text)
browser.find_elements_by_link_text(text)

<a> elements that completely match the text provided

browser.find_element_by_partial_link_text(text)
browser.find_elements_by_partial_link_text(text)

<a> elements that contain the text provided

browser.find_element_by_name(name)
browser.find_elements_by_name(name)

Elements with a matching name attribute value

browser.find_element_by_tag_name(name)
browser.find_elements_by_tag_name(name)

Elements with a matching tag name (case insensitive; an <a> element is matched by 'a' and 'A')

Table 2: WebElement Attributes and Methods

Attribute or method

Description

tag_name

The tag name, such as 'a' for an <a> element

get_attribute(name)

The value for the element’s name attribute

text

The text within the element, such as 'hello' in <span>hello</span>

clear()

For text field or text area elements, clears the text typed into it

is_displayed()

Returns True if the element is visible; otherwise returns False

is_enabled()

For input elements, returns True if the element is enabled; otherwise returns False

is_selected()

For checkbox or radio button elements, returns True if the element is selected; otherwise returns False

location

A dictionary with keys 'x' and 'y' for the position of the element in the page

Table 3: Commonly Used Variables in the selenium.webdriver.common.keys Module

Attributes

Meanings

Keys.DOWN, Keys.UP, Keys.LEFT, Keys.RIGHT

The keyboard arrow keys

Keys.ENTER, Keys.RETURN

The ENTER and RETURN keys

Keys.HOME, Keys.END, Keys.PAGE_DOWN, Keys.PAGE_UP

The home, end, pagedown, and pageup keys

Keys.ESCAPE, Keys.BACK_SPACE, Keys.DELETE

The ESC, BACKSPACE, and DELETE keys

Keys.F1, Keys.F2,..., Keys.F12

The F1 to F12 keys at the top of the keyboard

Keys.TAB

The TAB key

Table 4: Methods for Clicking Browser Buttons

Method name Description
browser.back() Clicks the Back button.
browser.forward() Clicks the Forward button.
browser.refresh() Clicks the Refresh/Reload button.
browser.quit() Clicks the Close Window button.